How Can We Analyze Differentially-Private Synthetic Datasets?

نویسنده

Anne-Sophie Charest

چکیده

Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thesis Proposal: Creation and Analysis of Differentially-Private Synthetic Datasets

Statistical agencies are faced with two conflicting objectives: protecting the privacy of their respondents, and providing researchers and policy makers with useful data. There exists a large body of literature on Statistical Disclosure Limitation (SDL) techniques, describing and evaluating methods for statistical agencies to share collected information to users while satisfying their confident...

متن کامل

Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions

Differential privacy has recently emerged in private statistical data release as one of the strongest privacy guarantees. Most of the existing techniques that generate differentially private histograms or synthetic data only work well for single dimensional or low-dimensional histograms. They become problematic for high dimensional and large domain data due to increased perturbation error and c...

متن کامل

On the Privacy Properties of Variants on the Sparse Vector Technique

The sparse vector technique is a powerful differentially private primitive that allows an analyst to check whether queries in a stream are greater or lesser than a threshold. This technique has a unique property – the algorithm works by adding noise with a finite variance to the queries and the threshold, and guarantees privacy that only degrades with (a) the maximum sensitivity of any one quer...

متن کامل

Final Document: Improving Utility of Differentially Private Confidence Intervals

A differentially private randomized algorithm, M , is one meeting the requirement that given two neighboring datasets d and d′, that is datasets that differ in no more than one row, and a set of outcomes S, the following condition that Pr[M(d) ∈ S] ≤ e Pr[M(d′) ∈ S] holds for some ≥ 0. Differentially private algorithms run on datasets can provide the guarantee that the information of any one co...

متن کامل

Personalized and Private Peer-to-Peer Machine Learning

The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable converg...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

How Can We Analyze Differentially-Private Synthetic Datasets?

نویسنده

چکیده

منابع مشابه

Thesis Proposal: Creation and Analysis of Differentially-Private Synthetic Datasets

Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions

On the Privacy Properties of Variants on the Sparse Vector Technique

Final Document: Improving Utility of Differentially Private Confidence Intervals

Personalized and Private Peer-to-Peer Machine Learning

عنوان ژورنال:

اشتراک گذاری